SPST-Index: A Self-Pruning Splay Tree Index for Caching Database Cracking
نویسندگان
چکیده
In database cracking, a database is physically self-organized into cracked partitions with cracker indices boosting the access to these partitions. The AVL Tree is the data structure of choice to implement cracker indices. However, it is particularly cache-inefficient for range queries, because the nodes accessed only for a few times (i.e, “Cold Data”) and the most accessed ones (i.e, “Hot Data”) are spread all over the index. In this paper, we present the Self-Pruning Splay Tree (SPST) data structure to index database cracking and reorganize “Hot Data” and “Cold Data” to boost the access to the cracked partitions. To every range query, the SPST rotates to the root the nodes pointing to the edges and to the middle value of the predicate interval. Eventually, the most accessed tree nodes remain close to the root improving CPU and cache activity. On the other hand, the least accessed tree nodes remain close to the leaves and are pruned to improve updates. Our experimental evaluation shows 37% more Instructions per Cycle and 75.9% less cache misses in L1 for lookup operations in the SPST compared to the AVL tree. Our data structure outperforms the AVL tree for lookups and maintenance costs in three major data access patterns: random, sequential and skewed. The SPST outperforms the AVL in 4% even in the worst case scenario with mixed workloads with lookups and batch updates.
منابع مشابه
Database Cracking
This paper talks about the self-oraganized index maintainance approach of database cracking. It provides fundamental knowledge as well as the description of a basic implementation of database cracking. Alongside multiple cracking algorithms, the implementation also contains the code of a cracking index. By combining these two features the implementation allows the execution of range queries on ...
متن کاملDatabase Cracking
Database indices provide a non-discriminative navigational infrastructure to localize tuples of interest. Their maintenance cost is taken during database updates. In this work we study the complementary approach, addressing index maintenance as part of query processing using continuous physical reorganization, i.e., cracking the database into manageable pieces. Each query is interpreted not onl...
متن کاملSelf-adjusting trees in practice for large text collections
Splay and randomized search trees (RSTs) are self-balancing binary tree structures with little or no space overhead compared to a standard binary search tree (BST). Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such voc...
متن کاملB-Tree: An All-Purpose Index Structure for String Similarity Search Based on Edit Distance
Strings are ubiquitous in computer systems and hence string processing has attracted extensive research effort from computer scientists in diverse areas. One of the most important problems in string processing is to efficiently evaluate the similarity between two strings based on a specified similarity measure. String similarity search is a fundamental problem in information retrieval, database...
متن کاملGraph Indexing: Tree + Delta >= Graph
Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of grap...
متن کامل